Modeling Word Meaning in Context with Substitute Vectors

2017-08-23

Abstract

Context representations are a key element in distributional models of word meaning. A recently proposed approach suggests to represent a context of a target word by a substitute vector. In this work, they propose a variant of substitute vectors, which is suitable for measure context similarity. Then a novel model for representing word meaning in context based on this context representation.

Introduction

A context of a word instance is typically represented by an unordered collection of its first-order neighboring words, called bag-of-words (BOW). In contrast, Yatbaz et al. (2012) proposed to represent this context as a second-order substitute vector. The main contribution of this paper is in proposing a model for word meaning in context, which is based on substitute vector context representations instead of the traditional bag-of-words representations.

Modeling word meaning

Word meaning out-of-context

They define the out-of-context representation for target word type $u$, as an average of the substitute vectors of its contexts.

$\overrightarrow{p_u} = \frac{1}{|C_u|} \sum_{i \in C_u} \overrightarrow{S_i}$

where $C_u$ is a collection of the contexts observed for target word type $u$ in a learning corpus, and $\overrightarrow{S_i}$ are their substitute vectors.

## Word meaning in context
They would like to alter the out-of-context representation by theoretically averaging only over contexts that induce a word sense similar to that of the given context.

To approximate this objective we use a weighted average of all contexts of $u$, where contexts are weighted according to their similarity to the given context:

$\overrightarrow{p_{u,c}} = \frac{1}{Z} \sum_{i \in C_{cu}} sim(c,i) \cdot \overrightarrow{S_i}$

Compared to the out-of-context representation, this is sensitive to the context similarity score which will make the representation biased to the given context.

in-context means the representation will use the context to build the vectors.

Evaluating Context Representations

Taks Description

Given a word windows context $c$ of a target word $u$, we wish to evaluate context similarity measures on their ability to retrieve other contexts of $u$ from $C_u$ that induce a similar sense.

To perform such an evaluation we want a dataset of target words with thousands of sense tagged contrext in $C_u$ for each target word $u$.

Pseudo-word methods

Because there is not large enough corpus, they propose a pseudo-word method which consider a set of real words as pseudo-senses of an artificial pseudo-word.

Pseudo-word methods consider a set of real words as pseudo-senses of an artificial pseudo-word.

pseudo-word: Sample from the learning corpus
pseudo-senses: For the pseudo-word, using WordNet to identify all of the word’s sysnets. Choose the least polysemous word which occurs at least 1000 times in the corpus as one of the pseudo-sense words.
Mixed contexts: Sample from the corpus 1000 contexts of each pseudo-sense of a pseudo-word.
query context: Sample a single context from its mixed contexts and then ranked the remaining contexts according to each of the compared context similarity mesaures.

Sample 100 words randomly from our learning corpus, ukWaC.
Use WordNet to identify all of the word’s synsets.
For each synset they chose the surface word which is the least polysemous yet occurs in our learning corpus at least 1,000 times, as a representative for this synset.
Then we created a pseudo-word whose pseudo-senses are the set of the representative words.
They sampled from their learning corpus 1,000 contexts for each pseudo-sense word, and for each pseudo-word they mixed together all contexts of its pseudo-sense words. The original pseudo-sense word for each context was recorded as its sense tag.
For each pseudo-word, they sampled a single query context from all of its mixed contexts and then ranked the remaining contexts according to each of othe compared context similarity measures.

Blog

Papers